AITopics | parameter update

Collaborating Authors

parameter update

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Reinforcement Learning Finetunes Small Subnetworks in Large Language Models

Neural Information Processing SystemsJun-22-2026, 13:48:53 GMT

Reinforcement learning (RL) yields substantial improvements in large language models' (LLMs) downstream task performance and alignment with human values. Surprisingly, such large gains result from updating only a small subnetwork comprising just 5%-30% of the parameters, with the rest effectively unchanged. We refer to this phenomenon as parameter update sparsity induced by RL. It is observed across all 7 widely-used RL algorithms (e.g., PPO, GRPO, DPO) and all 10 LLMs from different families in our experiments. This sparsity occurs without any explicit sparsity-promoting regularizations or architectural constraints.

large language model, machine learning, sparsity, (21 more...)

Neural Information Processing Systems

Country:

North America > Mexico (0.28)
North America > United States > Illinois (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

One-Step Generalization Ratio Guided Optimization for Domain Generalization

Cho, Sumin, Kim, Dongwon, Kim, Kwangsu

arXiv.org Machine LearningJun-16-2026

Domain Generalization (DG) aims to train models that generalize to unseen target domains but often overfit to domain-specific features, known as undesired correlations. Gradient-based DG methods typically guide gradients in a dominant direction but often inadvertently reinforce spurious correlations. Recent work has employed dropout to regularize overconfident parameters, but has not explicitly adjusted gradient alignment or ensured balanced parameter updates. We propose GENIE (Generalization-ENhancing Iterative Equalizer), a novel optimizer that leverages the One-Step Generalization Ratio (OSGR) to quantify each parameter's contribution to loss reduction and assess gradient alignment. By dynamically equalizing OSGR via a preconditioning factor, GENIE prevents a small subset of parameters from dominating optimization, thereby promoting domain-invariant feature learning. Theoretically, GENIE balances convergence contribution and gradient alignment among parameters, achieving higher OSGR while retaining SGD's convergence rate. Empirically, it outperforms existing optimizers and enhances performance when integrated with various DG and single-DG methods.

artificial intelligence, machine learning, one-step generalization ratio guided optimization, (13 more...)

arXiv.org Machine Learning

2606.16301

Country: Europe (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)

Add feedback

Continuous Subspace Optimization for Continual Learning

Neural Information Processing SystemsJun-15-2026, 02:52:53 GMT

Continual learning aims to learn multiple tasks sequentially while preserving prior knowledge, but faces the challenge of catastrophic forgetting when adapting to new tasks. Recently, approaches leveraging pre-trained models have gained increasing popularity in mitigating this issue, due to the strong generalization ability of foundation models. To adjust pre-trained models for new tasks, existing methods usually employ low-rank adaptation, which restricts parameter updates to a fixed low-rank subspace. However, constraining the optimization space inherently compromises the model's learning capacity, resulting in inferior performance. To address this limitation, we propose Continuous Subspace Optimization for Continual Learning (CoSO) to fine-tune the model in a series of subspaces rather than a single one. These sequential subspaces are dynamically determined through the singular value decomposition of the gradients.

artificial intelligence, justification, machine learning, (13 more...)

Neural Information Processing Systems

Country: Asia > China (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning

Neural Information Processing SystemsJun-13-2026, 11:18:22 GMT

Entropy minimization (EM) trains the model to concentrate even more probability mass on its most confident outputs. We show that this simple objective alone, without any labeled data, can substantially improve large language models' (LLMs) performance on challenging math, physics, and coding tasks. We explore three approaches: (1) EM-FT minimizes token-level entropy similarly to instruction finetuning, but on unlabeled outputs drawn from the model; (2) EM-RL: reinforcement learning with negative entropy as the only reward to maximize; (3) EM-INF: inference-time logit adjustment to reduce entropy without any training data or parameter updates. On Qwen-7B, EM-RL, without any labeled data, achieves comparable or better performance than strong RL baselines such as GRPO and RLOO that are trained on 60K labeled examples. Furthermore, EM-INF enables Qwen-32B to match or exceed the performance of proprietary models like GPT-4o, Claude 3 Opus, and Gemini 1.5 Pro on the challenging SciCode benchmark, while being 3x more efficient than self-consistency and sequential refinement. Our findings reveal that many pretrained LLMs possess previously underappreciated reasoning capabilities that can be effectively elicited through entropy minimization alone, without any labeled data or even any parameter updates.

large language model, machine learning, natural language, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.60)

Add feedback

Multi-Modal Interactive Agent Layer for Few-Shot Universal Cross-Domain Retrieval and Beyond

Neural Information Processing SystemsJun-11-2026, 09:20:33 GMT

This paper firstly addresses the challenge of few-shot universal cross-domain retrieval (FS-UCDR), enabling machines trained with limited data to generalize to novel retrieval scenarios, with queries from entirely unknown domains and categories. To achieve this, we first formally define the FS-UCDR task and propose the Multi-Modal Interactive Agent Layer (MAIL), which enhances the cross-modal interaction in vision-language models (VLMs) by aligning the parameter updates of target layer pairs across modalities. Specifically, MAIL freezes the selected target layer pair and introduces a trainable agent layer pair to approximate localized parameter updates. A bridge function is then introduced to couple the agent layer pair, enabling gradient communication across modalities to facilitate update alignment. The proposed MAIL offers four key advantages: 1) its cross-modal interaction mechanism improves knowledge acquisition from limited data, making it highly effective in low-data scenarios; 2) during inference, MAIL integrates seamlessly into the VLM via reparameterization, preserving inference complexity; 3) extensive experiments validate the superiority of MAIL, which achieves substantial performance gains over data-efficient UCDR methods while requiring significantly fewer training samples; 4) beyond UCDR, MAIL also performs competitively on few-shot classification tasks, underscoring its strong generalization ability.

artificial intelligence, machine learning, proceedings, (9 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.63)
Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

Towards Provable Emergence of In-Context Reinforcement Learning

Neural Information Processing SystemsJun-11-2026, 06:35:09 GMT

Typically, a modern reinforcement learning (RL) agent solves a task by updating its neural network parameters to adapt its policy to the task. Recently, it has been observed that some RL agents can solve a wide range of new out-of-distribution tasks without parameter updates after pretraining on some task distribution. When evaluated in a new task, instead of making parameter updates, the pretrained agent conditions its policy on additional input called the context, e.g., the agent's interaction history in the new task. The agent's performance increases as the information in the context increases, with the agent's parameters fixed. This phenomenon is typically called in-context RL (ICRL). The pretrained parameters of the agent network enable the remarkable ICRL phenomenon.

artificial intelligence, machine learning, reinforcement learning, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Rethinking Fair Federated Learning from Parameter and Client View

Neural Information Processing SystemsJun-10-2026, 21:26:22 GMT

Federated Learning is a promising technique that enables collaborative machine learning while preserving participant privacy.

artificial intelligence, machine learning, proceedings, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Uncertainty Estimation for Safety-critical Scene Segmentation via Fine-grained Reward Maximization

Neural Information Processing SystemsApr-28-2026, 14:26:18 GMT

Uncertainty estimation plays an important role for future reliable deployment of deep segmentation models in safety-critical scenarios such as medical applications. However, existing methods for uncertainty estimation have been limited by the lack of explicit guidance for calibrating the prediction risk and model confidence. In this work, we propose a novel fine-grained reward maximization (FGRM) framework, to address uncertainty estimation by directly utilizing an uncertainty metric related reward function with a reinforcement learning based model tuning algorithm. This would benefit the model uncertainty estimation through direct optimization guidance for model calibration. Specifically, our method designs a new uncertainty estimation reward function using the calibration metric, which is maximized to fine-tune an evidential learning pre-trained segmentation model for calibrating prediction risk.

machine learning, reinforcement learning, uncertainty estimation, (15 more...)

Neural Information Processing Systems

Country: Asia > China (0.28)

Industry:

Health & Medicine > Diagnostic Medicine (0.47)
Health & Medicine > Surgery (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Add feedback

Markov locality and relating it to p locality

Neural Information Processing SystemsApr-25-2026, 01:16:09 GMT

To gain intuition for how p-locality functions, we will introduce another notion of locality, called Markov locality, which will use the language of Markov blankets. We will prove that under relatively relaxed conditions p-locality and Markov locality are equivalent. This will allow us to relate the notion of locality to various graph structures commonly used to represent probability distributions, and will be a key step in proving Properties 2.1 and 2.2. We start by defining the Markov boundary, M(X,S), of a random variable X contained in a set of random variables S, as a minimal set such that p(X|S) = p(X|M(X,S)). The Markov boundary defines a minimal set of variables such that, conditioned on these variables, conditioning on no additional random variables in S changes the probability of X [39]. Similarly, we define the Markov blanket, M(X,S) for X in S as any set of variables such that conditioning on M(X,S), makes X conditionally independent from all other variables [39]. In this way, the Markov boundary is a Markov blanket but not all blankets are boundaries. Markov locality: Given probability distribution p(Z) and function f: RNX+NΘ RNΘ, the update function f(Z) is Markov-local with respect to the distribution p over Z if and only if k: Z Ωs.t. AMarkov boundary can be thought of as the set of variables that'locally' communicate with the parameter Θk, thus providing a natural measure of locality. Importantly, for Markov-locality to be of use, we would like the Markov boundaries of random variables in the model of interest to be unique.

artificial intelligence, logp, machine learning, (19 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology: